8 research outputs found

    Streaming Graph Challenge: Stochastic Block Partition

    Full text link
    An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.Comment: To be published in 2017 IEEE High Performance Extreme Computing Conference (HPEC

    VAST Challenge 2015: Mayhem at Dinofun World

    Get PDF
    A fictitious amusement park and a larger-than-life hometown football hero provided participants in the VAST Challenge 2015 with an engaging yet complex storyline and setting in which to analyze movement and communication patterns. The datasets for the 2015 challenge were large—averaging nearly 10 million records per day over a three day period—with a simple straightforward structured format. The simplicity of the format belied a complex wealth of features contained in the data that needed to be discovered and understood to solve the tasks and questions that were posed. Two Mini-Challenges and a Grand Challenge compose the 2015 competition. Mini-Challenge 1 contained structured location and date-time data for park visitors, against which participants were to discern groups and their activities. MiniChallenge 2 contained structured communication data consisting of metadata about time-stamped text messages sent between park visitors. The Grand Challenge required participants to use both movement and communication data to hypothesize when a crime was committed and identify the most likely suspects from all the park visitors. The VAST Challenge 2015 received 74 submissions, and the datasets were downloaded, at least partially, from 26 countries

    Visualization Evaluation for Cyber Security: Trends and Future Directions

    Get PDF
    The Visualization for Cyber Security research community (VizSec) addresses longstanding challenges in cyber security by adapting and evaluating information visualization techniques with application to the cyber security domain. This research effort has created many tools and techniques that could be applied to improve cyber security, yet the community has not yet established unified standards for evaluating these approaches to predict their operational validity. In this paper, we survey and categorize the evaluation metrics, components and techniques that have been utilized in the past decade of VizSec research literature. We also discuss existing methodological gaps in evaluating visualization in cyber security, and suggest potential avenues for future re- search in order to help establish an agenda for advancing the state-of-the-art in evaluating cyber security visualization

    GraphChallenge.org: Raising the Bar on Graph Analytic Performance

    Full text link
    The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems. GraphChallenge.org provides a wide range of pre-parsed graph data sets, graph generators, mathematically defined graph algorithms, example serial implementations in a variety of languages, and specific metrics for measuring performance. Graph Challenge 2017 received 22 submissions by 111 authors from 36 organizations. The submissions highlighted graph analytic innovations in hardware, software, algorithms, systems, and visualization. These submissions produced many comparable performance measurements that can be used for assessing the current state of the art of the field. There were numerous submissions that implemented the triangle counting challenge and resulted in over 350 distinct measurements. Analysis of these submissions show that their execution time is a strong function of the number of edges in the graph, NeN_e, and is typically proportional to Ne4/3N_e^{4/3} for large values of NeN_e. Combining the model fits of the submissions presents a picture of the current state of the art of graph analysis, which is typically 10810^8 edges processed per second for graphs with 10810^8 edges. These results are 3030 times faster than serial implementations commonly used by many graph analysts and underscore the importance of making these performance benefits available to the broader community. Graph Challenge provides a clear picture of current graph analysis systems and underscores the need for new innovations to achieve high performance on very large graphs.Comment: 7 pages, 6 figures; submitted to IEEE HPEC Graph Challenge. arXiv admin note: text overlap with arXiv:1708.0686

    Collaborative Data Analysis and Discovery for Cyber Security

    Get PDF
    ABSTRACT In this paper, we present the Cyber Analyst Real-Time Integrated Notebook Application (CARINA). CARINA is a collaborative investigation system that aids in decision making by co-locating the analysis environment with centralized cyber data sources, and providing next generation analysts with increased visibility to the work of others. In current generation cyber work, tools limit analyst's ability to collaborate, often relying on individual record keeping which hinders their ability to reflect on their own work and transition analytic insights to others. While online collaboration technologies have been shown to encourage and facilitate information sharing and group decision making in multiple contexts, no such technology exists today in cyber. Using visualization and annotation, CARINA leverages conversation and ad hoc thought to coordinate decisions across an organization. CARINA incorporates features designed to incentivize positive information-sharing behaviors, and provides a framework for incorporating recommendation engines and other analytics to guide analysts in the discovery of related data or analyses. In this paper, we present the user research that informed the development of CARINA, discuss the functionality of the system, and outline potential use cases. We also discuss future research trajectories and implications for cyber researchers and practitioners

    Eventpad: Rapid malware analysis and reverse engineering using visual analytics

    No full text
    Forensic analysis of malware activity in network environments is a necessary yet very costly and time consuming part of incident response. Vast amounts of data need to be screened, in a very labor-intensive process, looking for signs indicating how the malware at hand behaves inside e.g., a corporate network. We believe that data reduction and visualization techniques can assist security analysts in studying behavioral patterns in network traffic samples (e.g., PCAP). We argue that the discovery of patterns in this traffic can help us to quickly understand how intrusive behavior such as malware activity unfolds and distinguishes itself from the rest of the traffic.In this paper we present a case study of the visual analytics tool EventPad and illustrate how it is used to gain quick insights in the analysis of PCAP traffic using rules, aggregations, and selections. We show the effectiveness of the tool on real-world data sets involving office traffic and ransomware activity
    corecore